Construction Of Japanese Nominal Semantic Dictionary Using "A NOT B" Phrases In Corpora

نویسندگان

  • Sadao Kurohashi
  • Masaki Murata
  • Yasunori Yata
  • Mitsunobu Shimada
  • Makoto Nagao
چکیده

This paper describes a method of constructing Japanese nominal semantic dictionary, which is indispensable for text analysis, especially for indirect anaphora resolution. The main idea is to use noun phrases of "A NO(postposition) B" in corpora. Two nouns A and B in "A NO B" can have several semantic relations. By collecting "A NO B" phrases form corpora, analyzing their semantic relations, and arranging them for each "B" and each semantic relation, we can obtain a nominal semantic dictionary. The dictionary we constructed from 130M characters corpora by this method has 22,252 entries, which can be considered as a practically useful coverage. Our method for analyzing "A NO B" phrase is also original which uses a thesaurus as an attribute for decision tree. 1 I n t r o d u c t i o n The role of dictionary is undoubtedly important in Natural Language Processing (NLP). So far, research in NLP has mainly concerned the analysis of individual sentences. The analysis of a sentence is to clarify which element in it has relation with which by what relation. To do such an analysis, a verbal semantic dictionary, in other words, case frame dictionary is necessary. A case frame dictionary describes what kind of cases each verb has and what kinds of noun can fill a case slot. Condition on case slots can be expressed by semantic markers and/or example nouns. For example, a case frame for the verb "YOMU(read)" can be as follows: "Now at Communications Research Laboratory. Email: [email protected] tNow at Sharp Corporation. Y O M U ( r e a d ) agent : human beings, like KARE(he), KEN(ken), SENSEI(teacher) object : something to be read, like HON(book), SHOSETSU(novel) Such dictionaries with a practically useful coverage have been compiled in many institutes, mainly by hand, and used in many NLP systems (EDR, 1993; NTT, 1997). These days, the main target of NLP has been shifting from individual sentences to a series of sentences, that is, a text. Human beings use language to communicate, and the unit of communication is not a sentence, but a text in most cases, especially in the case of written language. The NLP system can only catch enough information when it handles a text as a whole. Similar to sentence analysis, the main part of text analysis is to clarify the relation among its

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatic Construction of Nominal Case Frames and its Application to Indirect Anaphora Resolution

This paper proposes a method to automatically construct Japanese nominal case frames. The point of our method is the integrated use of a dictionary and example phrases from large corpora. To examine the practical usefulness of the constructed nominal case frames, we also built a system of indirect anaphora resolution based on the case frames. The constructed case frames were evaluated by hand, ...

متن کامل

A Korean-Japanese-Chinese Aligned Wordnet with Shared Semantic Hierarchy

A Korean-Japanese-Chinese aligned wordnet, “CoreNet” is introduced. For the purpose of this paper, the term “wordnet” refers to a network of words. It is constructed based on a shared semantic hierarchy that is originated from NTT Goidaikei (Lexical Hierarchical System). Korean wordnet was constructed through the semantic category assignment to every meaning of Korean words in a dictionary. Ver...

متن کامل

Automatic Semantic Sequence Extraction from Unrestricted Non-Tagged Texts

Mophological processing, syntactic parsing and other useflfl tools have been proposed in the field of natural language processing(NLP). Many of those NLP tools take dictionary-based approaches. Thus these tools are often not very efficient with texts written in casual wordings or texts which contain m a w domain-specific terms, because of the lack of vocabulary. In this paper we propose a simpl...

متن کامل

A Rule-based Morpho-semantic Analyzer of the Japanese Verb Phrases of Simple Sentences

This paper presents the design and algorithms of a morpho-semantic analyzer to parse the Japanese verb phrases of simple sentences. This parser aims to understand the whole semantics of verb phrases by parsing them into semantic units, and thus differs from existing morphological analyzers that primarily segment sentences into morpho-phonemes, labeled with the classifications. Unlike other stat...

متن کامل

Semantic Analysis of Japanese Noun Phrases - A New Approach to Dictionary-Based Understanding

This paper presents a new method of analyzing Japanese noun phrases of the form N1 no N2. The Japanese postposition no roughly corresponds to of, but it has much broader usage. The method exploits a definition of N2 in a dictionary. For example, rugby no coach can be interpreted as a person who teaches technique in rugby. We illustrate the effectiveness of the method by the analysis of 300 test...

متن کامل

Automatic Paraphrasing of Japanese Functional Expressions Using a Hierarchically Organized Dictionary

Automatic paraphrasing is a transformation of expressions into semantically equivalent expressions within one language. For generating a wider variety of phrasal paraphrases in Japanese, it is necessary to paraphrase functional expressions as well as content expressions. We propose a method of paraphrasing of Japanese functional expressions using a dictionary with two hierarchies: a morphologic...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998